Novel Prefix Tri-Literal Word Analyser: Rule-Based Approach
نویسندگان
چکیده
Corresponding Author: Mohammed M. Abu Shquier Department of Information Science, University of Tabuk, Tabuk, KSA Email: [email protected] Abstract: Arabic stemming is a technique to find the stem or lexical root for Arabic words through the process of eliminating affixes (preffixes, infixes and suffixes) attached to their roots. Several approaches have been implemented to generate the stem of Arabic words according to a certain level of analysis, i.e., root-based approach, stem-based approach and statistical approach. Arabic language is a Semitic language which means that it is a derivational rather than a concatinative language. In this study we designed and implemented an Arabic triliteral Morphological Analyser that is capable of analysing the classical and Modern Standard Arabic (MSA) effectively with the capability of analysing vowelised, semivowelised and nonvowelised text. The system is integratable with other applications so that vast number of people can get benfited from. One shortcomming for the developed system is that the output obtained from the morphological analyser may contain several alternative solutions which leads to extraction ambiguity.
منابع مشابه
A Rule based Stemming Method for Multilingual Urdu Text
Urdu is a national language of Pakistan and spoken more than 200 million people use it as a verbal and written communication. There exists a large amount of unstructured Urdu textual data in the world; by applying data mining techniques useful information can be achieved. However it seriously lacks processing capabilities to develop innovative systems based on Urdu language. In this paper, auth...
متن کاملWhy Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies
Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and...
متن کاملPseudo-Identities and Bordered Words
This paper investigates the notions of θ-bordered words and θ-unbordered words for various pseudo-identity functions θ. A θ-bordered word is a non-empty word u such that there exists a word v which is a prefix of u while θ(v) is a suffix of u. The case where θ is the identity function corresponds to the classical notions of bordered and unbordered words. Here we explore cases where θ is a pseud...
متن کاملContaining overgeneration in Zulu computational morphology1
The development of a large-coverage, computational morphological analyser for Zulu requires the modelling not only of the regular phenomena often associated with word formation, but also the idiosyncratic behaviour that may occur in Zulu morphology. This paper discusses the application of an existing rule-based, finite-state morphological analyser prototype ZulMorph in semi-automating the minin...
متن کاملA Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions
We present a graph-based model for representing the lexical cohesion of a discourse. In the graph structure, vertices correspond to the content words of a text and edges connecting pairs of words encode how closely the words are related semantically. We show that such a structure can be used to distinguish literal and non-literal usages of multi-word expressions.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCS
دوره 11 شماره
صفحات -
تاریخ انتشار 2015